Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Cross-modal chiastopic-fusion attention network for visual question answering
Mao WANG, Yaxiong PENG, Anjiang LU
Journal of Computer Applications    2022, 42 (3): 854-859.   DOI: 10.11772/j.issn.1001-9081.2021030470
Abstract269)   HTML8)    PDF (759KB)(82)       Save

In order to improve the accuracy of Visual Question Answering (VQA) model in answering complex image questions, a Cross-modal Chiastopic-fusion Attention Network (CCAN) for VQA was proposed. Firstly, an improved residual channel self-attention method was proposed to pay attention to the image, and to find important areas according to overall information of the image, thereby introduced a new joint attention mechanism that combined word attention and image area attention; secondly, a “cross-modal chiastopic-fusion” network was proposed to generate multiple features to integrate the two dynamic information flows together, and an effective attention flow was generated in each modal. Among them, element-wise multiplication method was used for joint features. In addition, in order to avoid an increase in computational cost, parameters were shared between networks. Experimental results on VQA v1.0 dataset show that the accuracy of the proposed model reaches 67.57%, which is 2.97 percentage points higher than that of MLAN (Multi-level Attention Network) model, 1.20 percentage points higher than that of CAQT (Co-Attention network with Question Type) model. The proposed method effectively improves the accuracy of visual question answering model. The effectiveness and robustness of the method are verified.

Table and Figures | Reference | Related Articles | Metrics